Single channel speech separation in modulation frequency domain based on a novel pitch range estimation method
نویسندگان
چکیده
Computational Auditory Scene Analysis (CASA) has been the focus in recent literature for speech separation from monaural mixtures. The performance of current CASA systems on voiced speech separation strictly depends on the robustness of the algorithm used for pitch frequency estimation. We propose a new system that estimates pitch (frequency) range of a target utterance and separates voiced portions of target speech. The algorithm, first, estimates the pitch range of target speech in each frame of data in the modulation frequency domain, and then, uses the estimated pitch range for segregating the target speech. The method of pitch range estimation is based on an onset and offset algorithm. Speech separation is performed by filtering the mixture signal with a mask extracted from the modulation spectrogram. A systematic evaluation shows that the proposed system extracts the majority of target speech signal with minimal interference and outperforms previous systems in both pitch extraction and voiced speech separation.
منابع مشابه
A Novel Frequency Domain Linearly Constrained Minimum Variance Filter for Speech Enhancement
A reliable speech enhancement method is important for speech applications as a pre-processing step to improve their overall performance. In this paper, we propose a novel frequency domain method for single channel speech enhancement. Conventional frequency domain methods usually neglect the correlation between neighboring time-frequency components of the signals. In the proposed method, we take...
متن کاملSource-Filter-Based Single-Channel Speech Separation Using Pitch Information
In this paper, we investigate the source–filter-based approach for single-channel speech separation. We incorporate source-driven aspects by multi-pitch estimation in the model-driven method. For multi-pitch estimation, the factorial HMM is utilized. For modeling the vocal tract filters either vector quantization (VQ) or non-negative matrix factorization are considered. For both methods, the fi...
متن کاملRobust HNR-Based Closed-Loop Pitch and Harmonic Parameters Estimation
An important problem in speech coding framework is model parameters estimation. In most cases parametric speech coding methods do not preserve shape of speech waveform. This fact implies straightforward parameters estimation and analysisby-synthesis method is hardly used. A novel analysis-by-synthesis parameters estimation method in speech coders based on harmonic models presented. We introduce...
متن کاملDrum Source Separation using Percussive Feature Detection and Spectral Modulation
We present a method for the separation and resynthesis of drum sources from single channel polyphonic mixtures. The frequency domain technique involves identifying the presence of a drum using a novel percussive feature detection function, after which the short-time magnitude spectrum is estimated and scaled according to an estimated time-amplitude function derived from the percussive measure. ...
متن کاملComplex tensor factorization in modulation frequency domain for single-channel speech enhancement
This paper proposes a novel method of speech enhancement using tensor factorization, which is extended from complex non-negative matrix factorization (CMF), in the modulation frequency domain. Non-negative matrix factorization (NMF) has attracted a great deal of attention as a recent approach to speech enhancement for its ease of feature detection in the acoustic frequency domain. However, prev...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- EURASIP J. Adv. Sig. Proc.
دوره 2012 شماره
صفحات -
تاریخ انتشار 2012